Estimating good discrete partitions from observed data: 
symbolic false nearest neighbors 
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A symbolic analysis of observed time series data requires making a discrete partition of a con- 
tinuous state space containing observations of the dynamics. A particular kind of partition, called 
"generating" , preserves all dynamical information of a deterministic map in the symbolic represen- 
tation, but such partitions are not obvious beyond one dimension, and existing methods to find 
them require significant knowledge of the dynamical evolution operator or the spectrum of unstable 
periodic orbits. We introduce a statistic and algorithm to refine empirical partitions for symbolic 
state reconstruction. This method optimizes an essential property of a generating partition: avoid- 
ing topological degeneracies. It requires only the observed time series and is sensible even in the 
presence of noise when no truly generating partition is possible. Because of its resemblance to a 
geometrical statistic frequently used for reconstructing valid time-delay embeddings, we call the 
algorithm "symbolic false nearest neighbors". 

PACS numbers: 05.45b 
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Why might one want to represent observed time series 
of dynamical systems as sequences of low-precision dis- 
crete symbols? In this representation, there are often in- 
teresting techniques — often (but not exclusively) derived 
from information theory and its associated technology — 
which may illuminate data in novel ways Q. The ini- 
tial step for all these methods requires making a par- 
tition: a coloring of the state space x e R'^, into 
non-overlapping regions and associated symbols so that 
any x is assigned a unique symbol s in a discrete alpha- 
bet. The symbol may be represented as an integer in the 
set 0, 1, ... A — 1. A partition V defines a discretization 
of the observed sequence x^, z = 1 . . . iV into a symbolic 
sequence, Si,i = 1 . . .N. 

What partitions are "good" ? Which discretizations re- 
tain the full structure of the original dynamics in the x 
space in the sequence of symbols? Unfortunately, the 
situation is unlike the remarkable time-delay embedding 
method for continuous dynamics: simple partitions are 
not generically satisfactory. The mathematics of sym- 
bolic dynamics specifies what we want: a "generating 
partition" (GP), where symbolic orbits uniquely iden- 
tify one continuous space orbit, and thus the symbolic 
dynamics is fully equivalent to the continuous space dy- 
namics. 

Unfortunately there is no satisfactory mathematical 
theory about how to find a GP as a general procedure (ex- 
cept for one dimensional dynamics {d = 1), where parti- 
tioning at the critical points works) . Are the ad-hoc par- 
titions often used still satisfactory? Unfortunately they 
are often not so. BoUt et al Q examined the degrada- 
tion in the symbolic dynamics which results from the fre- 
quently used "histogram partition" , as opposed to a GP. 
A less optimal partition will induce improper projections 
or degeneracies, where a given symbolic segment may 



correspond to more than one topologically distinct state 
space orbit. This resulted in finding the wrong topolog- 
ical entropy. Chaotic communication with symbolic tar- 
geting works most satisfactorily knowing a GP, because 
then the transmitted symbolic message may be directly 
mapped into a desired orbit in the attractor. (see, e.g. 

Since a partition is a critical first step for any sym- 
bolic data analysis, a poor partition yield poor results, 
a method to approximate good partitions from observed 
data alone is urgently needed. With apparently no ex- 
isting satisfactory solutions, this is the problem we at- 
tack. Davidchack et al Q recently presented a parti- 
tioning method which works by successively coloring un- 
stable periodic orbits (UPOs) to ensure unique codings 
(all UPOs have unique codes under a GP). The neces- 
sary high-order UPOs are very difficult to obtain from 
observed data alone, unfortunately. 

The Kolmogorov-Sinai entropy rate has of the dynam- 
ics can be found from the supremum, over all increas- 
ingly fine partitions, of Shannon's entropy rate evaluated 
on the information source implied by the discretization. 
More strikingly, a GP also achieves this supremum with 
a finite, and, one hopes, small alphabet. This suggests 
a naive strategy whereby one maximizes a statistical es- 
timator of the entropy rate evaluated on the sequence 
induced by candidate partitions 0. This apparently at- 
tractive idea is flawed as demonstrated by the follow- 
ing counterexample. Consider a partition of the state 
space with a fine box-size e where each region is ran- 
domly assigned a symbol from the alphabet. For suffi- 
ciently small e, the symbol sequence of any finite time 
series will appear indistinguishable from a memoryless 
and structureless information source with the maximum 
possible entropy rate h = log2 A, since each observed da- 
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turn could have encountered a different partition element 
with a new random symbol.. With the highest entropy 
possible this partition would be selected over competitors 
but is clearly useless for data analysis as the resulting 
symbolic stream says nothing about the original time se- 
ries or its particular dynamics. Even if one believes this 
pathology to be irrelevant to coarser encodings, there are 
practical problems with the maximum-estimated-entropy 
idea. First, estimation of h is not trivial to do well; sec- 
ond, when there is observational noise (inevitable with 
data acquisition equipment) larger alphabets will appear 
to give significantly higher entropies even if they are not 
actually much better at encoding the dynamics. As the 
true entropy rate of the system is not already known (and 
often a key quantity one wants to estimate given a good 
partition), there is no absolute statistical target which 
confirms whether the proposed partition is at all close or 
far from the ideal. In practice, selecting partitions with 
entropy does not seem to work well in general. 

We assert our practical criterion for a good partition: 
short sequences of consecutive symbols ought to localize 
the corresponding continuous state space point as well as 
possible. A good coding ought to maintain the benefits 
of a low-precision symbolic representation with minimum 
distortion of the original state space dynamics. Our cen- 
tral idea is to form a particular geometrical embedding 
of the symbolic sequence under the candidate partition 
and evaluate, and minimize, a statistic which quantifies 
the apparent errors in localizing state space points. 

We embed the symbol sequence into the unit square : 

Cfcmax femax \ 

£ s,_(fc_l)M^ £ s.+./A'^ . (1) 
k=l k=l ) 

{kmax is chosen such that A~^""^^ is as small as the com- 
putational precision.) For a binary alphabet (A = 2), the 
first coordinate of is the binary fraction whose digits 
start at Si and go backwards in time, the second is with 
the sequence going forward from s^+i. Intuitively, the 
distribution on y is like a "P-dependent symbolic version 
of the invariant measure. 

Given x.^ and a partition V, the symbolic embed- 
ding ^ yields a parallel series y^, defining points on 
some map y — (/ip(x). We want cj)-p to be injective, i.e. 
(/>-p(x) = (/)-p(x') implies x = x'. With finite data, we de- 
sire that if I |(/)-p(x) — (/)-p(x')|| is small, so is | |x — x'| |. By 
construction, sufficiently near points in x have close sym- 
bolic sequences in their most significant digits. In a good 
partition, additionally, nearby points in y remain close 
when mapped back into the x-space. By contrast, bad 
partitions induce topological degeneracies where similar 
symbolic words map back to globally distant regions of 
state space. As shown in , this phenomenon confounds 
proper analysis of the observed symbolic dynamics. 

We need to quantify how well any candidate par- 



tition achieves our ideal. We find the nearest neigh- 
bor, in Euclidean distance, to each point y^. Conven- 
tional fc-d tree algorithms efficiently provide the in- 
dex of the nearest neighbor to any point in a data set: 
JV\i] = argmiufc^i ||yfc - yi||. Knowing symbolic neigh- 
bors, we find distances of those same points back in x- 
space, Di = \ We normalize the set of Di by 

a monotonic transformation: given any D, find its rank 
R G [0, 1] in the cumulative distribution of random two- 
point distances | |xq — x^ 1 1 . Large R means that localizing 
well in symbol space did not localize well in the original 
state space. 

Better partitions give a smaller proportion of sym- 
bolic false nearest neighbors, that fraction of Ri which 
are greater than some threshold rj, denoted Jsfnn- This 
resembles the false neighbors statistic for time-delay 
embeddings 9]: both count large-deviation "mistakes" 
in a related space which result from topological mis- 
embedding in the tested space. Appropriate values for 
r] which defining a "large" deviation are 77 ~ 0.01 — 0.3, 
depending on the noise in x,;. An alternative to Jgfnn is 
-f^'sfnn, defined as the arithmetic average of the largest 7 
percentile of the set of Ri. Using Jsfnn, V may need tun- 
ing depending on the noise scale and dynamical system, 
but the effect of changing 7 is lower. On the downside, 
-f^sfnn does not necessarily converge to near zero for the 
optimal partition. We typically find good results with 
7 « 0.01 -0.05. 

For concrete numerical calculations, we need to param- 
eterize partitions with a relatively small number of free 
parameters. Inspired by 01, we define partitions with re- 
spect to a set of radial-basis "influence" functions of the 
form /fc(x) = afc/||x — Zfelp, the set of a and z being the 
free variables. For any particular x, one /;(x) will generi- 
cally result in the largest value versus other /fe(x), k ^ I, 
and then x is assigned to that symbol which was pre- 
assigned to influence function The z parameters are 
initialized to random examples from the x^ and a to inde- 
pendent random variates [0, 1), and n/ functions assigned 
to each of the A symbols. In |j] the z^ were fixed on the 
UPOs and their symbols varied; here, the centers and co- 
efficients vary but their symbols are fixed. We minimize 
Jstnn or i^stnn ovcr the Ant(d 4- 1) free parameters using 
"differential evolution" [lC| , a genetic algorithm suitable 
for continuous parameter spaces. 

Figure Hlsho ws the final V on 2000 data points from the 
Ikeda map[llj. It shows the best result (lowest Jsfnn) out 
of six restarts changing only the random seed governing 
the initial conditions; the results were not much worse on 
the other runs, however. The result is very close to the 
partition knowing the dynamics. 

In a stationary information source, the number of dis- 
tinct length-p codewords will scale, for asymptotic p, as 
Np (X e^"^^ where /i^ is the topological entropy, a dy- 
namical invariant. We validate V with an estimate of 
the deficiency between hx implied by V and the correct 
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FIG. 1: Left: partition estimated by SFNN optimization on 
2000 data points from the Ikeda map. Right: partition cal- 
culated with foreknowledge of UPOs, numerically extracted 
from the equation of motion. The partition we estimate from 
observed data alone is quite close to a presumably correct 
one, calculated from the method of The measure on the 
two figures is not the same: the left figure is a sample of the 
natural measure, whereas the right shows UPOs up to period 
16. They avoid regions of homoclinic tangencies, contributing 
to the blank spaces. 



0.15r 



i0.3 



FIG. 3: Minimizing Ji'sfnn with 7 = 0.01: estimated partition 
for a time series of 5000 data points from the Lozi map with 
10% additive by amplitude Gaussian noise. Either the xi or 
X2 axes are GPs for the noiseless map. Here despite the noise 
the algorithm finds a partition close to what would be a GP. 
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FIG. 2: For each new best partition: Minimization target 
Jsfnn (circles, and right scale), estimated deficiency in topo- 
logical entropy Shr (asterisks and left scale) Minimizing Jsfnn 
generally minimizes Shr and thus maximizes topological en- 
tropy of the symbolic language. 

hr: SJiT = Pmax"^ Ep=rP"^ log (^p/^p)- Np is the 
number of distinct period-p UPOs (which were computed 
knowing the equations of motion), Np the number of such 
UPOs with unique p-symbol codes in some V. A GP gives 
dhr = 0, and Shr for better (less UPO-degenerate) 
partitions. Figure[21shows Shx on each new best partition 
found during the optimization. The optimization target, 
>^fnn, decreases strictly monotonically by construction; 
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FIG. 4: Top: estimated binary partition for time-delay em- 
bedding of inter-bubble time intervals (arbitrary units) , min- 
imizing iC'stnn. Bottom: Jstnn(?7) vs Tj for the optimized par- 
tition (circles) and for a naive equiprobable histogram parti- 
tion with the same alphabet (asterisks). For the optimized 
partition there are very few large distance errors, e.g. Jsfnn 
observed above rj = 0.1. 

though Shx does not decrease quite monotonically, the 
trend toward very small values is clear. This gives evi- 
dence that minimizing Jsfnn also refines approximations 
to GPs. 

Figures|2H51demonstrate applications of the algorithm. 
Fig.Oshows the effect of noise on a system where the GP 
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FIG. 5: Same as Fig. 0] but with combustion engine heat 
release time series (energy, arbitrary units), and A = 3. The 
noise level is higher thus there remain more moderately sized 
distances, even with a larger alphabet which usually results 
in better localization. 

is analytically known. The Lozi map (see analysis in [^) 
is similar to the Henon map but replaces the quadratic 
nonlinearity with a piecewise linear one. We find a par- 
tition which is close to the noise-free GP even when the 
data have been contaminated by significant amounts of 
additive noise. Though complete localization to a single 
point is not possible here, minimizing large divergences is 
still a desirable criterion. Figures 01 to [3 show estimated 
partitions on experimental data sets where no analytical 
form of the equations (much less partitions) are known. 
On account of noise, dynamical or observational, a cer- 
tain amount of divergence D for symbolic nearest neigh- 
bors is inevitable. Still, minimizing large deviations is a 
reasonable goal even for noisy data. There are very few 
rank distances with R > 0.2 or 0.3 compared to a basic 
histogram partition. 

It must be kept in mind that GPs are not necessarily 
unique for any attractor. Distinctly different partitions 
may be found, all of which are reasonably satisfactory. 
(At a minimum any iterate of a GP is also a GP). There 
are many coexisting solutions (roughly like finding low- 
energy states of a spin glass) , which is why the optimiza- 
tion problem is hard, requiring a global search method. 
We conjecture this is one reason why understanding the 
structure of generating partitions has been so difficult for 
mathematicians . 

We also point out that it is possible to partition one 
space X, but quantify distances Di — ||z7v'[i] "2:^11 in 
some different space as long as there is some relation 
between each and z^. For example, one may be in- 



terested in a simple symbolic control scheme, say where 
Zi = Xi+T (find the best partition of observables now 
that best predicts some future), or perhaps when two 
different variables are measured simultaneously and one 
wants to cross-predict. In the first case, in principle, a 
GP should be optimal for predicting the future as well 
as the present but the inevitable issues of finite data and 
noise may make the best empirical partition different for 
the two cases. In the second case, generating partitions 
are irrelevant entirely. 
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